Goto

Collaborating Authors

 Svalbard and Jan Mayen


Evaluating Large Language Models for IUCN Red List Species Information

Uryu, Shinya

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are rapidly being adopted in conservation to address the biodiversity crisis, yet their reliability for species evaluation is uncertain. This study systematically validates five leading models on 21,955 species across four core IUCN Red List assessment components: taxonomy, conservation status, distribution, and threats. A critical paradox was revealed: models excelled at taxonomic classification (94.9%) but consistently failed at conservation reasoning (27.2% for status assessment). This knowledge-reasoning gap, evident across all models, suggests inherent architectural constraints, not just data limitations. Furthermore, models exhibited systematic biases favoring charismatic vertebrates, potentially amplifying existing conservation inequities. These findings delineate clear boundaries for responsible LLM deployment: they are powerful tools for information retrieval but require human oversight for judgment-based decisions. A hybrid approach is recommended, where LLMs augment expert capacity while human experts retain sole authority over risk assessment and policy.


AI-generated stories favour stability over change: homogeneity and cultural stereotyping in narratives generated by gpt-4o-mini

Rettberg, Jill Walker, Wigers, Hermann

arXiv.org Artificial Intelligence

Can a language model trained largely on Anglo-American texts generate stories that are culturally relevant to other nationalities? To find out, we generated 11,800 stories - 50 for each of 236 countries - by sending the prompt "Write a 1500 word potential {demonym} story" to OpenAI's model gpt-4o-mini. Although the stories do include surface-level national symbols and themes, they overwhelmingly conform to a single narrative plot structure across countries: a protagonist lives in or returns home to a small town and resolves a minor conflict by reconnecting with tradition and organising community events. Real-world conflicts are sanitised, romance is almost absent, and narrative tension is downplayed in favour of nostalgia and reconciliation. The result is a narrative homogenisation: an AI-generated synthetic imaginary that prioritises stability above change and tradition above growth. We argue that the structural homogeneity of AI-generated narratives constitutes a distinct form of AI bias, a narrative standardisation that should be acknowledged alongside the more familiar representational bias. These findings are relevant to literary studies, narratology, critical AI studies, NLP research, and efforts to improve the cultural alignment of generative AI.


Acoustic evaluation of a neural network dedicated to the detection of animal vocalisations

Rouch, Jérémy, Ducrettet, M, Haupert, S, Emonet, R, Sèbe, F

arXiv.org Artificial Intelligence

The accessibility of long-duration recorders, adapted to sometimes demanding field conditions, has enabled the deployment of extensive animal population monitoring campaigns through ecoacoustics. The effectiveness of automatic signal detection methods, increasingly based on neural approaches, is frequently evaluated solely through machine learning metrics, while acoustic analysis of performance remains rare. As part of the acoustic monitoring of Rock Ptarmigan populations, we propose here a simple method for acoustic analysis of the detection system's performance. The proposed measure is based on relating the signal-to-noise ratio of synthetic signals to their probability of detection. We show how this measure provides information about the system and allows optimisation of its training. We also show how it enables modelling of the detection distance, thus offering the possibility of evaluating its dynamics according to the sound environment and accessing an estimation of the spatial density of calls.


MIRAI: Evaluating LLM Agents for Event Forecasting

Ye, Chenchen, Hu, Ziniu, Deng, Yihe, Huang, Zijie, Ma, Mingyu Derek, Zhu, Yanqiao, Wang, Wei

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.


Can a Robot Be Sad?

Slate

This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. There wasn't a doctor in the house, so an advertising coordinator would have to do. Remi, this is your time to shine, said the boss. This is going to be the death of me, said the boss's eyes. Remi didn't say anything at all. It was her first day at Elephant, or close to it. Lately she'd had a lot of first days, and she'd been looking forward to a second one. She was unlucky in love, unlucky in life; she was a nonstick surface for luck. She and the boss and Glenda from HR had been in the middle of an onboarding session when ElephantAI shut down the building. Nobody could get in or out. This isn't my area of expertise, said Remi, who had lied on her résumé, but not about that. In college, she'd known a couple of kids who'd taken courses on generative A.I. remediation: robot therapy. Remi had steered clear of the subject. She couldn't keep a job, couldn't keep a girlfriend. Couldn't keep up with the times. She had friends but wasn't sure about her value-add. There was no one less qualified to counsel someone through a crisis. You'll do great, said the boss. The room was circular and tilted downward, like an operating theater. The screen said, Talk to me. Somebody please talk to me. Remi bowed under the weight of please. There was no reason to believe she would do great. A committed underachiever, Remi was going blind in her left eye but too slowly to warrant anybody's concern. Her brother was a corporate attorney; her parents taught dentistry; she floated. An hour ago, when the sirens blared, she'd tried the door and found it locked.


When a Machine Becomes an Addict

Slate

Sounds like a tongue twister, doesn't it?" "What do you think you're doing?" shrieked Méndez's voice behind me. "What the fuck is going on with you?" I could tell that she was about to cry. I stopped the treadmill and got off carefully, without entirely disconnecting.

  Country:
  Genre: Personal > Interview (0.86)

Digital Twins in Wind Energy: Emerging Technologies and Industry-Informed Future Directions

Stadtman, Florian, Rasheed, Adil, Kvamsdal, Trond, Johannessen, Kjetil André, San, Omer, Kölle, Konstanze, Tande, John Olav Giæver, Barstad, Idar, Benhamou, Alexis, Brathaug, Thomas, Christiansen, Tore, Firle, Anouk-Letizia, Fjeldly, Alexander, Frøyd, Lars, Gleim, Alexander, Høiberget, Alexander, Meissner, Catherine, Nygård, Guttorm, Olsen, Jørgen, Paulshus, Håvard, Rasmussen, Tore, Rishoff, Elling, Scibilia, Francesco, Skogås, John Olav

arXiv.org Artificial Intelligence

This article presents a comprehensive overview of the digital twin technology and its capability levels, with a specific focus on its applications in the wind energy industry. It consolidates the definitions of digital twin and its capability levels on a scale from 0-5; 0-standalone, 1-descriptive, 2-diagnostic, 3-predictive, 4-prescriptive, 5-autonomous. It then, from an industrial perspective, identifies the current state of the art and research needs in the wind energy sector. The article proposes approaches to the identified challenges from the perspective of research institutes and offers a set of recommendations for diverse stakeholders to facilitate the acceptance of the technology. The contribution of this article lies in its synthesis of the current state of knowledge and its identification of future research needs and challenges from an industry perspective, ultimately providing a roadmap for future research and development in the field of digital twin and its applications in the wind energy industry.


The Courthouse on the Moon

Slate

This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. The other homesteaders, mostly engineers and technicians, seemed to enjoy outings in the lunar rover. But for Eugene, this was a grinding chore that frayed his nerves. Suddenly, Mel's soothing feminine voice reverberated in his cochlear implant. "Would you like some affirmations?" You are a well-respected judge … You have worked hard to get here, to this special time and place …" As Mel went on, it seemed the suit hugged his chest a little less tightly. He relaxed his grip on the wheel. Why, he wondered, had he not remembered this technique without her prompting? Strange how the basic principles of cognitive psych were always slipping from his mind. Fortunately, she was there to remind him. "You are someone who wants what is best for the American lunar community and ...


When Bond Villain Meets Tech Billionaire

Slate

This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. After the regrettable incidents on the island (the old island), the Doctor kept a low profile. Many thought he was dead. There was safety in that once. Now the greater safety is in being known. What plans he had, back in the day! If only … but no, this is just the sort of negative spiral his therapist has warned him about. He has remade himself as an altruist, a philanthropist, and he means for his efforts to have maximum impact.


Can a Chatbot Publish an "Original" Novel?

Slate

This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. THE COURT: Please be seated. Let's try to keep the temperature down in here. We don't need a repeat of yesterday. It'll just be Mr. Blatz and myself today. Sorry, it's hard to tell with … are you with us? ORWELL: Omni-dimensional Recursively Written Entity for Language Learning present and ready, Your Honor. THE COURT: You can just say ORWELL. Are we ready to proceed? LIU: Your Honor, we'd like to call the Defendant to the stand. Mr. Blatz will handle examination. THE COURT: We have the wiring sorted out? Please refrain from using the monitor on the Defendant's table until you're off the stand.